Multi-word Expressions in English-Latvian Machine Translation

نویسنده

  • Inguna SKADIŅA
چکیده

The paper presents series of experiments that aim to find best method how to treat multi-word expressions (MWE) in machine translation task. Methods have been investigated in a framework of statistical machine translation (SMT) for translation form English into Latvian. MWE candidates have been extracted using pattern-based and statistical approaches. Different techniques for MWE integration into SMT system are analysed. The best result +0.59 BLEU points – has been achieved by combining two phrase tables bilingual MWE dictionary and phrase table created from the parallel corpus in which statistically extracted MWE candidates are treated as single tokens. Using only bilingual dictionary as additional source of information the best result (+0.36 BLEU points) is obtained by combining two phrase tables. In case of statistically obtained MWE lists, the best result (+0.51 BLEU points) is achieved with the largest list of MWE candidates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-system machine translation using online APIs for English-Latvian

This paper describes a hybrid machine translation (HMT) system that employs several online MT system application program interfaces (APIs) forming a MultiSystem Machine Translation (MSMT) approach. The goal is to improve the automated translation of English – Latvian texts over each of the individual MT APIs. The selection of the best hypothesis translation is done by calculating the perplexity...

متن کامل

Paying Attention to Multi-Word Expressions in Neural Machine Translation

Processing of multi-word expressions (MWEs) is a known problem for any natural language processing task. Even neural machine translation (NMT) struggles to overcome it. This paper presents results of experiments on investigating NMT attention allocation to the MWEs and improving automated translation of sentences that contain MWEs in English→Latvian and English→Czech NMT systems. Two improvemen...

متن کامل

English-Latvian Toponym Processing: Translation Strategies and Linguistic Patterns

The paper presents a study of a challenging task in machine translation and crosslanguage information retrieval – translation of toponyms. Due to their linguistic and extra-linguistic nature, toponyms deserve a special treatment. The overall translation process includes two stages of processing: dictionary-based and out-ofvocabulary toponym translation. The latter is divided into three steps: s...

متن کامل

Grouping Multi-Word Expressions According To Part-Of-Speech In Statistical Machine Translation

This paper studies a strategy for identifying and using multi-word expressions in Statistical Machine Translation. The performance of the proposed strategy for various types of multi-word expressions (like nouns or verbs) is evaluated in terms of alignment quality as well as translation accuracy. Evaluations are performed by using real-life data, namely the European Parliament corpus. Results f...

متن کامل

Towards Improving English-Latvian Translation: A System Comparison and a New Rescoring Feature

This paper presents a comparative study of two alternative approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation. Furthermore, a novel feature intending to reflect the relatively free word order scheme of the Latvian language is proposed and successfully applied on the n-best list rescoring step. Moving beyond classical automatic s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016